Writing productive stencil codes with overlapped tiling ‡ 3

نویسندگان

  • Jia Guo
  • Ganesh Bikshandi
  • Basilio B. Fraguela
چکیده

Stencil computations constitute the kernel of many scientific applications. Tiling is often used to improve 11 the performance of stencil codes for data locality and parallelism. However, tiled stencil codes typically require shadow regions, whose management becomes a burden to programmers. In fact, it is often the 13 case that the code required to manage these regions, and in particular their updates, is much longer than the computational kernel of the stencil. As a result, shadow regions usually impact programmers’ 15 productivity negatively. In this paper, we describe overlapped tiling, a construct that supports shadow regions in a convenient, flexible and efficient manner in the context of the hierarchically tiled array 17 (HTA) data type. The HTA is a class designed to express algorithms with a high degree of parallelism and/or locality as naturally as possible in terms of tiles. We discuss the syntax and implementation of 19 overlapped HTAs as well as our experience in rewriting parallel and sequential codes using them. The results have been satisfactory in terms of both productivity and performance. For example, overlapped 21 HTAs reduced the number of communication statements in non-trivial codes by 78% on average while speeding them up. We also examine different implementation options and compare overlapped HTAs with 23 previous approaches. Copyright © 2008 John Wiley & Sons, Ltd.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and Use of htalib - A Library for Hierarchically Tiled Arrays

Hierarchically Tiled Arrays (HTAs) are data structures that facilitate locality and parallelism of array intensive computations with block-recursive nature. The model underlying HTAs provides programmers with a global view of distributed data as well as a single-threaded view of the execution. In this paper we present htalib, a C++ implementation of HTAs. This library provides several novel con...

متن کامل

A Domain-Specific Language and Compiler for Stencil Computations on Short-Vector SIMD and GPU Architectures

Stencil computations are an integral part of applications in a number of scientific computing domains, such as image processing and partial differential equations. We describe a domain-specific language for regular stencil computations, that allows specification of the computations in a concise manner. We describe a multi-target compiler for this DSL, that generates optimized code for multi-cor...

متن کامل

A framework that supports in writing performance-optimized stencil-based codes

On modern multicore processors, many applications run much slower than one would expect when looking at the vast congregated computational power. After a discussion of factors determining the performance on such CPUs, a framework concept that simplifies writing performance-optimized codes for stencil-based algorithms and a prototypical implementation are presented. Finally, the suitability of t...

متن کامل

Locality Optimization of Stencil Applications Using Data Dependency Graphs

This paper proposes tiling techniques based on data dependencies and not in code structure. The work presented here leverages and expands previous work by the authors in the domain of non traditional tiling for parallel applications. The main contributions of this paper are: (1) A formal description of tiling from the point of view of the data produced and not from the source code. (2) A mathem...

متن کامل

University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Diamond Tiling: A Tiling Framework for Time-iterated Scientific Applications

This paper fully develops Diamond Tiling, a technique to partition the computations of stencil applications such as FDTD. The Diamond Tiling technique is the result of optimizing the amount of useful computations that can be executed when a region of memory is loaded to the local memory of a multiprocessor chip. Diamond Tiling contributes to the state of the art on time tiling techniques in tha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009